phishing detection
Role-Aware Multi-modal federated learning system for detecting phishing webpages
Wang, Bo, Khan, Imran, White, Martin, Beloff, Natalia
We present a federated, multi-modal phishing website detector that supports URL, HTML, and IMAGE inputs without binding clients to a fixed modality at inference: any client can invoke any modality head trained elsewhere. Methodologically, we propose role-aware bucket aggregation on top of FedProx, inspired by Mixture-of-Experts and FedMM. We drop learnable routing and use hard gating (selecting the IMAGE/HTML/URL expert by sample modality), enabling separate aggregation of modality-specific parameters to isolate cross-embedding conflicts and stabilize convergence. On TR-OP, the Fusion head reaches Acc 97.5% with FPR 2.4% across two data types; on the image subset (ablation) it attains Acc 95.5% with FPR 5.9%. For text, we use GraphCodeBERT for URLs and an early three-way embedding for raw, noisy HTML. On WebPhish (HTML) we obtain Acc 96.5% / FPR 1.8%; on TR-OP (raw HTML) we obtain Acc 95.1% / FPR 4.6%. Results indicate that bucket aggregation with hard-gated experts enables stable federated training under strict privacy, while improving the usability and flexibility of multi-modal phishing detection.
- Europe > United Kingdom > England > East Sussex > Brighton (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
A Gradient-Optimized TSK Fuzzy Framework for Explainable Phishing Detection
Pentapalli, Lohith Srikanth, Salisbury, Jon, Riep, Josette, Cohen, Kelly
Phishing attacks represent an increasingly sophisticated and pervasive threat to individuals and organizations, causing significant financial losses, identity theft, and severe damage to institutional reputations. Existing phishing detection methods often struggle to simultaneously achieve high accuracy and explainability, either failing to detect novel attacks or operating as opaque black-box models. To address this critical gap, we propose a novel phishing URL detection system based on a first-order Takagi-Sugeno-Kang (TSK) fuzzy inference model optimized through gradient-based techniques. Our approach intelligently combines the interpretability and human-like reasoning capabilities of fuzzy logic with the precision and adaptability provided by gradient optimization methods, specifically leveraging the Adam optimizer for efficient parameter tuning. Experiments conducted using a comprehensive dataset of over 235,000 URLs demonstrate rapid convergence, exceptional predictive performance (accuracy averaging 99.95% across 5 cross-validation folds, with a perfect AUC i.e. 1.00). Furthermore, optimized fuzzy rules and membership functions improve interoperability, clearly indicating how the model makes decisions - an essential feature for cybersecurity applications. This high-performance, transparent, and interpretable phishing detection framework significantly advances current cybersecurity defenses, providing practitioners with accurate and explainable decision-making tools.
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > Nepal (0.04)
Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction
With the rise of sophisticated phishing attacks, there is a growing need for effective and economical detection solutions. This paper explores the use of large multimodal agents, specifically Gemini 1.5 Flash and GPT-4o mini, to analyze both URLs and webpage screenshots via APIs, thus avoiding the complexities of training and maintaining AI systems. Our findings indicate that integrating these two data types substantially enhances detection performance over using either type alone. However, API usage incurs costs per query that depend on the number of input and output tokens. To address this, we propose a two-tiered agentic approach: initially, one agent assesses the URL, and if inconclusive, a second agent evaluates both the URL and the screenshot. This method not only maintains robust detection performance but also significantly reduces API costs by minimizing unnecessary multi-input queries. Cost analysis shows that with the agentic approach, GPT-4o mini can process about 4.2 times as many websites per $100 compared to the multimodal approach (107,440 vs. 25,626), and Gemini 1.5 Flash can process about 2.6 times more websites (2,232,142 vs. 862,068). These findings underscore the significant economic benefits of the agentic approach over the multimodal method, providing a viable solution for organizations aiming to leverage advanced AI for phishing detection while controlling expenses.
Comparative Analysis of Black-Box and White-Box Machine Learning Model in Phishing Detection
Fajar, Abdullah, Yazid, Setiadi, Budi, Indra
Background: Explainability in phishing detection model can support a further solution of phishing attack mitigation by increasing trust and understanding how phishing can be detected. Objective: The aims of this study to determine and best recommendation to apply an approach which has several components with abilities to fulfil the critical needs Methods: A methodology starting with analyzing both black-box and white-box models to get the pros and cons specifically in phishing detection. The conclusion of the analysis will be validated by experiment using a set of well-known algorithms and public phishing datasets. Experimental metrics covers 3 measurements such as predictive accuracy and explainability metrics. Conclusion: Both models are comparable in terms of interpretability and consistency, with room for improvement in diverse datasets. EBM as an example of white-box model is generally better suited for applications requiring explainability and actionable insights. Finally, each model, white-box and black-box model has positive and negative aspects both for performance metric and for explainable metric. It is important to consider the objective of model usage.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Netherlands (0.04)
- Asia > Nepal (0.04)
- (2 more...)
To Ensemble or Not: Assessing Majority Voting Strategies for Phishing Detection with Large Language Models
The effectiveness of Large Language Models (LLMs) significantly relies on the quality of the prompts they receive. However, even when processing identical prompts, LLMs can yield varying outcomes due to differences in their training processes. To leverage the collective intelligence of multiple LLMs and enhance their performance, this study investigates three majority voting strategies for text classification, focusing on phishing URL detection. The strategies are: 1) a prompt-based ensemble, which utilizes majority voting across the responses generated by a single LLM to various prompts 2) a model-based ensemble, which entails aggregating responses from multiple LLMs to a single prompt; and 3) a hybrid ensemble, which combines the two methods by sending different prompts to multiple LLMs and then aggregating their responses. Our analysis shows that ensemble strategies are most suited in the cases where individual components--whether prompts or LLMs--exhibit equivalent performance levels. However, when there is a significant discrepancy in individual performance, the effectiveness of the ensemble method may not exceed that of the highest-performing single LLM or prompt. In such instances, opting for ensemble techniques is not recommended.
- Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.05)
- North America > Canada (0.04)
Enhancing Phishing Detection through Feature Importance Analysis and Explainable AI: A Comparative Study of CatBoost, XGBoost, and EBM Models
Fajar, Abdullah, Yazid, Setiadi, Budi, Indra
Phishing attacks remain a persistent threat to online security, demanding robust detection methods. This study investigates the use of machine learning to identify phishing URLs, emphasizing the crucial role of feature selection and model interpretability for improved performance. Employing Recursive Feature Elimination, the research pinpointed key features like "length_url," "time_domain_activation" and "Page_rank" as strong indicators of phishing attempts. The study evaluated various algorithms, including CatBoost, XGBoost, and Explainable Boosting Machine, assessing their robustness and scalability. XGBoost emerged as highly efficient in terms of runtime, making it well-suited for large datasets. CatBoost, on the other hand, demonstrated resilience by maintaining high accuracy even with reduced features. To enhance transparency and trustworthiness, Explainable AI techniques, such as SHAP, were employed to provide insights into feature importance. The study's findings highlight that effective feature selection and model interpretability can significantly bolster phishing detection systems, paving the way for more efficient and adaptable defenses against evolving cyber threats
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.85)
- (3 more...)
NoPhish: Efficient Chrome Extension for Phishing Detection Using Machine Learning Techniques
Thaqi, Leand, Halili, Arbnor, Vishi, Kamer, Rexha, Blerim
The growth of digitalization services via web browsers has simplified our daily routine of doing business. But at the same time, it has made the web browser very attractive for several cyber-attacks. Web phishing is a well-known cyberattack that is used by attackers camouflaging as trustworthy web servers to obtain sensitive user information such as credit card numbers, bank information, personal ID, social security number, and username and passwords. In recent years many techniques have been developed to identify the authentic web pages that users visit and warn them when the webpage is phishing. In this paper, we have developed an extension for Chrome the most favorite web browser, that will serve as a middleware between the user and phishing websites. The Chrome extension named "NoPhish" shall identify a phishing webpage based on several Machine Learning techniques. We have used the training dataset from "PhishTank" and extracted the 22 most popular features as rated by the Alexa database. The training algorithms used are Random Forest, Support Vector Machine, and k-Nearest Neighbor. The performance results show that Random Forest delivers the best precision.
- Europe > Norway > Eastern Norway > Oslo (0.04)
- Europe > Kosovo > District of Pristina > Pristina (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Position Paper: Think Globally, React Locally -- Bringing Real-time Reference-based Website Phishing Detection on macOS
Petrukha, Ivan, Stulova, Nataliia, Kryvoblotskyi, Sergii
Background. The recent surge in phishing attacks keeps undermining the effectiveness of the traditional anti-phishing blacklist approaches. On-device anti-phishing solutions are gaining popularity as they offer faster phishing detection locally. Aim. We aim to eliminate the delay in recognizing and recording phishing campaigns in databases via on-device solutions that identify phishing sites immediately when encountered by the user rather than waiting for a web crawler's scan to finish. Additionally, utilizing operating system-specific resources and frameworks, we aim to minimize the impact on system performance and depend on local processing to protect user privacy. Method. We propose a phishing detection solution that uses a combination of computer vision and on-device machine learning models to analyze websites in real time. Our reference-based approach analyzes the visual content of webpages, identifying phishing attempts through layout analysis, credential input areas detection, and brand impersonation criteria combination. Results. Our case study shows it's feasible to perform background processing on-device continuously, for the case of the web browser requiring the resource use of 16% of a single CPU core and less than 84MB of RAM on Apple M1 while maintaining the accuracy of brand logo detection at 46.6% (comparable with baselines), and of Credential Requiring Page detection at 98.1% (improving the baseline by 3.1%), within the test dataset. Conclusions. Our results demonstrate the potential of on-device, real-time phishing detection systems to enhance cybersecurity defensive technologies and extend the scope of phishing detection to more similar regions of interest, e.g., email clients and messenger windows.
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Switzerland (0.04)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
AI-based email security provider Cyberfish acquired by Cofense
US-based phishing detection and response solution provider Cofense announced last week the acquisition of Israel's Cyberfish, a provider of next-generation phishing protection powered by computer vision and advanced machine learning technology. Cofense said that by integrating innovative machine learning capabilities from Cyberfish with Cofense's detection and response technology, the American company will bring to market a holistic, advanced automation solution for email protection, detection, and response. According to Cofense, organizations are rethinking their email security architecture and technology stack with the acceleration of digital transformation and migration to cloud email services from Microsoft 365 and Google Workspace. Cofense's phishing intelligence, already deployed in thousands of enterprises around the world, will be used to train and evolve Cyberfish's machine-learning algorithms to block malicious emails in real-time, the company said. "Cyberfish is excited to join forces with Cofense to solve the phishing problem and take email security to another level," said Dima Kagan, Cyberfish CEO and Co-founder.
Phishing Detection Using Machine Learning Techniques
The Internet has become an indispensable part of our life, However, It also has provided opportunities to anonymously perform malicious activities like Phishing. Phishers try to deceive their victims by social engineering or creating mock-up websites to steal information such as account ID, username, password from individuals and organizations. Although many methods have been proposed to detect phishing websites, Phishers have evolved their methods to escape from these detection methods. One of the most successful methods for detecting these malicious activities is Machine Learning. This is because most Phishing attacks have some common characteristics which can be identified by machine learning methods.